Q-Learning

Q-Learning (Show when cumreward > 100)

SARSA

SARSA (Show when cumreward > 100)

Report

In this lab, I do some experiments by running Q-Learning algorithm and SARSA algorithm.

First, I run Q-Learning and SARSA for 40000 episodes and record the play time and reward.

Compare the lifetime and reward

I find out that in the 40000 episodes, Q-Learning can actually learn faster than SARSA. Since the average lifetime of Q-Learning is obviously higher than SARSA. And the average reward is too.

Compare the behavior

For the second experiment, I just want to show those games that has reward over than 100 in both algorithm. However, in the process, I ecounter the problem mention by TA, which means the agents play for too long.

Then I show demo for both algorithm. From my observation, I think that when SARSA plays, the bird will 'prepare' for next pipe early, to prevent the 'cliffs'. When Q-Learning plays, the bird flies a little bit risky.